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Abstract 

The current consensus is that the majority of proteins act in concert in the cell, as homo- and 
heteromeric complexes of two or more proteins that carry out discrete biological functions. A wide 
range of genomic, proteomic, biochemical, structural and biophotonic techniques have been 
employed over the years to study the protein-protein interactions that define complexes, with the 
end goal of producing a spatiotemporal map of these modular functional units throughout the cell. 
Recent advances in the analysis of in vivo complexes have greatly improved structural, functional and 
temporal resolution, and this review highlights novel approaches ranging from proximity-dependent 
labeling and cross-linking/mass spectrometry through pulse-chase epitope labeling and targeted 
protein degradation. 



Structural analyses of multiprotein complexes 

"No man is an island entire of itself " (John Donne). 

With the assembly of proteins into functional complexes 
thought to underlie most, if not all, biological processes, 
characterization of these structures is a key goal in cell 
biology. An initial step is identification of complex 
members, both stably- and transiently-associated, and their 
intra-complex interactions. The current method of choice for 
interactome analyses is affinity purification followed by 
mass spectrometry (AP-MS; for review see [1-3]), and the 
development of quantitative methods has enabled resolu- 
tion of the components of large multiprotein complexes 
and provided information about their stoichiometry [4-7]. 
What this type of approach on its own does not provide, 
however, is information about the topological structure of 
the complex and the functional significance of each mem- 
ber. Furthermore, it traditionally involves breaking open 
cells to extract complexes for analysis, a process that can 
be disruptive to the underlying protein-protein associations. 

BioID is a recently developed technique that comple- 
ments traditional AP/MS-based interactome mapping 



by highlighting intracellular protein "neighbours" in vivo, 
using a proximity labelling/affinity purification 
strategy [8]. It was inspired by the DamID method 
utilized to detect DNA-protein interactions via methyla- 
tion of DNA sequences proximal to a DNA binding protein 
fused to Dam methylase [9]. In the BioID approach, a 
promiscuous prokaryotic biotin protein ligase (BirA*) is 
fused to the protein of interest. When expressed in cells, the 
fusion protein will biotinylate proteins with which it comes 
into close proximity, such as direct binding partners and 
neighbouring proteins in multiprotein complexes (Fig. 1A). 
Importantly, it has the ability to capture both stable 
interactions and transient or weak interactions. Biotinylated 
proteins can be isolated by affinity purification, using a 
streptavidin agarose matrix, and identified by MS. Caveats 
include the inability to distinguish direct vs. indirect 
interactors (similar to AP/MS) and the unknown activity 
radius of BirA*, which would define the resolution of this 
technique. As an initial non-biased screen for in vivo 
associations, however, a major strength is the accessibility 
of this method to a wide range of researchers, in that it 
requires only standard molecular and cell biology techni- 
ques and access to proteomics services. 
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Figure I . Structural analyses of multiprotein complexes 
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A. In the BiolD approach, fusion of a promiscuous E co/i biotin protein ligase (BirA*) to the protein of interest promotes biotinylation of near-neighbour proteins 
in vivo. These biotinylated proteins can then be captured by affinity purification and identified by mass spectrometry. B. The combination of cross-linking with 
mass spectrometry can provide information about protein-protein interactions and multiprotein complex architecture. Complexes (either in vivo or affinity purified) 
are treated with a bi-functional cross-linking reagent that creates a covalent link between adjacent regions of polypeptide chains. These links can be intra-chain 
(within the same protein; green) or inter-chain (within neighbouring proteins; red). Proteolytic digests are then analyzed by LC-MS/MS to identify cross-linked 
peptides, which in turn provide structural information about the protein complexes. C. A non-radioactive translation-controlled pulse-chase system that 
enables spatiotemporal monitoring of the biogenesis of multiprotein compexes. Left panel: Cells transformed with a plasmid encoding the gene of interest (with a 
C-terminal affinity tag) downstream of an HA tag and amber stop codon (UAG) only synthesize the HA peptide due to translational termination at this premature 
stop codon. Middle panel: Co-expression of an engineered orthogonal pair of amber suppressor tRNA° me T/r and tRNA-synthetase allows incorporation of 
the unnatural amino acid O-methyl tyrosine (Ome-Tyr) and suppression of the UAG stop codon, leading to expression of the full HA-protein-affinity tag construct. 
Addition of Ome-Tyr to the media thus induces a translational pulse of tagged protein expression. Right panel: Due to a tetracycline-regulatable riboswitch 
engineered into the 5' UTR of the HA-UAG-gene-affinity tag plasmid, synthesis can then be inhibited at any time by addition of tetracycline (translational chase). 
Affinity purification of the tagged protein and interactome mapping at different time points following the pulse can be used to probe changes in complex composition. 
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Higher resolution probing of the topology of multi- 
protein complexes, both in vitro and in vivo, has been 
enabled by the coupling of classic chemical cross-linking 
techniques with recent advances in mass spectrometer 
instrumentation, proteomic methodologies and bioin- 
formatics (for review see [10-13]). Cross-linking pro- 
vides proximity information, revealing not only which 
proteins are getting cross-linked but also at which sites 
the cross-linking takes place (Fig. IB). The combina- 
tion of cross-linking with mass spectrometry has been 
utilized, both on its own and in combination with other 
structural analysis methods to probe the architecture of 
complexes such as ribosomes [14], proteasomes [15], 
RNA polymerase II-TFIIF [16], and the protein 
phosphatase 2A (PP2A) phosphatase network [17]. 
Although challenges for cross-linking/MS include the 
low abundance of cross-linked peptides and the com- 
plexity of the fragmentation spectra, these are being 
addressed by the development of more efficient, affinity 
tag-linked cross-linking reagents and more powerful data 
analysis and quantitation software [12,18]. Incorpora- 
ting quantitative measurements further extends the ability 
of this approach to analyze the dynamic assembly/ 
disassembly and functional composition of multiprotein 
complexes [19]. 

The biological importance of the assembly order of 
multiprotein complexes was recently demonstrated by 
the analysis of evolutionary gene fusion events in a large 
number of species, which revealed that protein com- 
plexes are under evolutionary selection to assemble via 
ordered pathways [20]. Mapping a dynamic assembly 
process using proteomic approaches requires sufficient 
temporal resolution, which can be provided by a recently 
developed protein translation-controlled pulse-chase 
system (Fig. 1C; [21]). In this approach, a pulse of 
de novo synthesis of a tagged protein is followed by a time 
course of affinity purification and interactome mapping 
to reveal dynamic changes in the composition of 
complexes to which it is targeted. High time resolution 
is achieved by controlling protein expression at the level 
of translation. Cells are transformed both with a plasmid 
encoding the gene of interest (flanked by N- and 
C-terminal affinity tags) with an in-frame amber stop 
codon (UAG) inserted just after the N-terminal tag, and 
with a plasmid encoding an engineered orthogonal pair 
of amber suppressor tRNA° me ~ Tyr and tRNA synthetase. 
In the absence of Ome-Tyr, only the N-terminal tag is 
synthesized because translation halts at the UAG (Fig. 1C). 
Addition of Ome-Tyr to the media allows cells to incor- 
porate this unnatural amino acid at the UAG, leading 
to translation of the full-length fusion protein. A tc-apta 
riboswitch engineered into the 5' end of the transcript 
allows the "pulse" of de novo protein expression to be 



rapidly shut down upon addition of tetracycline, which 
binds the riboswitch and interferes with translation 
initiation. Although this pulse-chase method was deve- 
loped in yeast, incorporation of unnatural amino acids via 
orthogonal amber suppressor tRNA/tRNA synthetase pairs 
(for review see [22]) extends its use to other biological 
systems, including Drosophila [23] and mammalian 
cells [24], highlighting the potentially broad applicability 
of this technique for the spatiotemporal analysis of 
multiprotein complex assembly. 

Functional analyses of multiprotein complexes 

Valuable clues to the physiological function of a protein 
can be obtained by observing the downstream phenotypic 
effects of removing it from cells or organisms (Fig. 2A). 
At the DNA level, powerful knockout and mutagenesis 
approaches based on homologous recombination and 
zinc finger nucleases have enabled targeted deletion/ 
mutation of specific genes in model systems, including 
yeast, mammalian cells and mice (for review see [25-27]). 
At the RNA level, post-transcriptional gene silencing 
techniques based on RNA interference (RNAi; for review 
see [28,29]) permit the knockdown of specific proteins via 
targeted degradation of their mRNA. A caveat to these 
approaches is that both lie upstream of the protein itself, 
incorporated within its functional multiprotein complex 
(es), and thus necessitate a time delay until protein levels 
are reduced (which will vary based on protein turnover 
rates) or limit studies into the effects of permanent 
removal of the protein. 

In contrast, targeted destruction offers the ability to study 
the acute effects of the immediate removal of a protein. 
One in vivo approach is the spatially and temporally 
defined photo-ablation method Chromophore-Assisted 
Light Inactivation (CALI; for review see [30]), in which 
the target protein is fused to a fluorophore, such as 
KillerRed, that produces substantial amounts of reactive 
oxygen species (ROS) upon absorption of light at a 
particular wavelength. The end result is destruction of the 
target protein in the region of interest, although caveats 
include inadvertent protein cross-linking and inactiva- 
tion of proteins beyond the immediate target by 
diffusion of the ROS. Alternative in vivo methods that 
take advantage of the endogenous ubiquitin-mediated 
protein degradation pathway [31] have been developed, 
and two recent protein knockout techniques, based on 
the SKPl/CULl/F-box (SCF) protein ubiquitin ligase 
complex, extend both the in vivo applicability and time 
resolution of this type of approach. 

By fusing an F-box protein derived from Drosophila 
(Slmb) with a single-domain antibody fragment 
(vhhGFP4) that recognizes green fluorescent protein 
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Figure 2. Functional analyses of multiprotein complexes 
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A. Functional studies have traditionally relied on disruption of the protein of interest at the level of the gene or mRNA transcript. One drawback to this 
is the lack of real time information, i.e. the immediate effects of the removal of that protein. B. Targeted knockout of GFP fusion proteins. The genetically 
encoded deGradFP targets an F-box domain (NSImb) to GFP fusion proteins via a single domain anti-GFP nanobody (vhhGFP4). Association of the F-box domain 
with endogenous SCF (SKPI/CULI/F-box) protein ligase complexes leads to ubiquitination and subsequent proteasome-mediated degradation of the fusion 
protein. C. Inducible, reversible degradation of proteins mediated by a plant auxin-inducible degron (AID). Fusion of AID to the gene of interest and 
co-expression of the plant F-box protein TIRI facilitates rapid, inducible degradation of the fusion protein upon addition of the auxin hormone indole-3-acetic 
acid (IAA), as TIRI binds and is incorporated into endogenous SCF complexes. Inclusion of a GFP tag provides localization information and the ability to 
monitor protein loss in live cells. Protein levels recover quickly after removal of IAA. 



(GFP) and related derivatives, Caussinus and colleagues 
created a genetically encoded method, which they call 
deGradFP, that enables the direct depletion of target 
proteins fused to these fluorophores [32]. Recruitment of 
endogenous SCF complex members by Slmb leads to 
ubiquitination of the fusion protein and its targeted 
degradation by the proteasome (Fig. 2B), which can be 
monitored via fluorescence imaging. Although a major 



strength of this approach is its potential applicability to 
any GFP-tagged construct, the authors noted the failure 
of deGradFP to induce degradation of both free GFP and 
a particular protein incorporated into adherens junc- 
tions. While this suggests possible structural and 
accessibility limitations, it will be easier to judge the 
extent and implications of this as the technique is 
applied to a wider range of substrates. 
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Auxin-induced degradation, based on a unique ligand- 
induced degradation pathway in plants, is a related 
approach that can further increase the time resolution of 
protein depletion and is fully reversible [33,34]. In this 
system, a target protein is expressed as a fusion with an 
auxin-inducible degron (AID) in cells that also exogen- 
ously express the plant F-box protein Transport Inhibitor 
Response 1 (T1R1). Auxin hormones such as indole-3- 
acetic acid (1AA) promote the interaction between AID- 
containing proteins and T1R1, which can in turn recruit 
endogenous SCF proteins and promote ubiquitination 
and proteasome-mediated degradation of the target 
protein (Fig. 2C). Depletion is thus inducible, rapid and 
complete, and fully reversible upon removal of LAA In 
this study, the authors demonstrate targeted depletion 
of a wide range of substrates localized to different regions 
of mammalian cells, including nucleosomal histones, 
centrosome-, centromere- and telomere-associated pro- 
teins and cytoplasmic cyclin Bl. Degradation could be 
readily monitored by fluorescence imaging, due to an 
additional GFP (or related derivative) tag added to the 
fusion proteins. As with any fusion protein, the addition 
of tags (in this case 25 kDa AID plus 27 kDa GFP) has the 
potential to affect localization and function, and thus 
proper in vivo behavior must first be demonstrated. 
Furthermore, a limitation of both deGradFP and AID- 
induced degradation is that they cannot conttol the 
stability of endogenous untagged proteins. The continued 
refinement of homologous recombination strategies in 
various model systems does, however, offer the potential 
to extend these technologies by genetically encoding the 
tags in frame with endogenous genes. 

Conclusion 

Taken together, these novel methods for dissecting multi- 
protein complexes in vivo offer unprecedented sensitivity 
and spatiotemporal resolution, which is a large step 
toward the ultimate goal of mapping functional multi- 
protein complexes throughout the cell under a variety of 
conditions. Importantly, most do not require specialized 
knowledge or equipment and are therefore accessible to a 
wide range of researchers. They can also be adapted to 
different model systems and further modified to increase 
their resolution and applicability. 

Abbreviations 

AID, auxin-inducible degron; AP-MS, affinity purifica- 
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ligase; TIR1, Transport Inhibitor Response 1. 
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